Data Analysis Replication Assignment

Information About the Publication

Hopkins 2008: REASSESSING THE MASS OF EXCEPTIONALLY LARGE RODENTS USING TOOTHROW LENGTH AND AREA AS PROXIES FOR BODY MASS

A common tool used by paleontologists to understand the physiology, ecology and evolution of fossil mammals (and also fossil vertebrates) is reconstructing body size. In mammals in particular, this can be done using dimensions of the appendicular skeletal elements and dental morphology. For rodents in particular this most commonly is seen using the first lower molar (m1), but in this paper a proxy for body mass is developed using toothrow length and area for the dentary (lower jaw).

The main objective of this publication was to estimate body mass for extinct rodent taxa from dental toothrow length and area and determine the effectiveness of these proxies compared with previously published estimations of body mass from femur diameter.

This study showed that tooth dimensions are accurate for estimating body mass when looking at interspecific comparisons, and even omre accurate proxies when looking at narrow subclades within Rodentia that are more closely related to a fossil taxa of interest.

In the following code I will replicate the regression analyses for establishing both lower toothrow length and rectangular toothrow area as proxies for estimating body mass. The fossil taxa utilized in this study did not have dental measurements provided so could not be included in this replication assignment, but summary statistics and visualizations of the proxy data can be found below.

Libraries Used

library(tidyverse)
library(dplyr)
library(tidyr)
library(cowplot)
library(broom)

library(sjPlot)
library(sjmisc)
library(sjlabelled)
library(sjstats)

library(jtools)

Loading in Dataset

rodents <- "https://raw.githubusercontent.com/allyboville3/Data-Replication-Assignment/main/Hopkins%202008%20-%20Appendix1.csv"

rodent_df <- read_csv(rodents, col_names = TRUE)

head(rodent_df)
## # A tibble: 6 × 7
##   Species         `MVZ specimen` `mass (g)` `LTRL (mm)` `M1 width (mm)` subclade
##   <chr>                    <dbl> <chr>            <dbl>           <dbl> <chr>   
## 1 Aplodontia rufa         172402 1465              17.5            4.13 Aplodon…
## 2 Aplodontia rufa          22624 1375              18.2            3.83 Aplodon…
## 3 Aplodontia rufa          22626 1350              17.9            3.84 Aplodon…
## 4 Aplodontia rufa          22627 1048.2            17.5            3.61 Aplodon…
## 5 Aplodontia rufa         116278 1069              17.6            3.35 Aplodon…
## 6 Aplodontia rufa          96062 832               17.1            4.01 Aplodon…
## # ℹ 1 more variable: `pub. avg. mass (g)` <chr>

Cleaning Up the Data for All Species Analysis

Create columns for Rectangular Toothrow Area (RTRA)

RTRA = (LTRL X M1 width)

With the acknowledgement that we are making the assumption that rodent cheek teeth are essentially rectangular in shape (and that this may result in an overestimation of body mass), area was calculated by multiplying the length of the toothrow by the width of the first lower molar. The first is most often the largest while the first and second molar are the widest. M1 width in this data refers to either the first or second molar width which can be assumed to be the same and representative of the maximum width of the toothrow.

rodent_with_RTRA <- rodent_df %>% 
  mutate(RTRA = (`LTRL (mm)` * `M1 width (mm)`))

rodent_with_RTRA
## # A tibble: 425 × 8
##    Species        `MVZ specimen` `mass (g)` `LTRL (mm)` `M1 width (mm)` subclade
##    <chr>                   <dbl> <chr>            <dbl>           <dbl> <chr>   
##  1 Aplodontia ru…         172402 1465              17.5            4.13 Aplodon…
##  2 Aplodontia ru…          22624 1375              18.2            3.83 Aplodon…
##  3 Aplodontia ru…          22626 1350              17.9            3.84 Aplodon…
##  4 Aplodontia ru…          22627 1048.2            17.5            3.61 Aplodon…
##  5 Aplodontia ru…         116278 1069              17.6            3.35 Aplodon…
##  6 Aplodontia ru…          96062 832               17.1            4.01 Aplodon…
##  7 Aplodontia ru…         114341 926               16.4            3.99 Aplodon…
##  8 Aplodontia ru…          96058 760.7             17.2            3.82 Aplodon…
##  9 Aplodontia ru…          96063 820.9             16.3            3.68 Aplodon…
## 10 Aplodontia ru…         115540 741               16.8            3.76 Aplodon…
## # ℹ 415 more rows
## # ℹ 2 more variables: `pub. avg. mass (g)` <chr>, RTRA <dbl>

Averaging Values and Cleaning Up Spreadsheet for Interspecises Comparisons

rodent_avg <- rodent_with_RTRA %>% 
  mutate(Mass = as.numeric(`mass (g)`), Pub_avg_mass = as.numeric(`pub. avg. mass (g)`), RTRA = as.numeric(`RTRA`)) %>% #change data for Mass, Published Average Mass, and Retangular Toothrow Area from type chr to dbl
  group_by(Species) %>% #grouping data by Species
  tidyr::fill(Pub_avg_mass, .direction ="down") %>% #filling in NA's in published averaged masses to make the following code happy
  #"The average body mass reported by Ernest (2003) was used for 3 species (Hydrochoerus hydrochaeris, Graphiurus murinus, and Cryptomys hottentotus) for which museum specimens lacked data on body mass." - Copied from Hopkins 2008
  mutate(Mass = coalesce(Mass,Pub_avg_mass)) %>% #Sub NA values for Mass with Previously published mass averages
  summarize(
    Mass = mean(`Mass`), #averages mass for each species
    LTRL = mean(`LTRL (mm)`), #averages toothrow length for each species
    `Toothrow_Width` = mean(`M1 width (mm)`), #averages toothrow width (based off of M1 width) for each species
    RTRA = mean(RTRA)
    ) %>% 
  mutate(across(where(is.numeric), ~ round(., 2))) # rounding data to only have two decimal places

rodent_avg
## # A tibble: 75 × 5
##    Species                      Mass  LTRL Toothrow_Width   RTRA
##    <chr>                       <dbl> <dbl>          <dbl>  <dbl>
##  1 Aethomys hindei              98.4  6.29           1.84  11.6 
##  2 Akodon subfuscus             18.8  4.08           1      4.09
##  3 Ammospermophilus leucurus   110.   6.31           1.93  12.2 
##  4 Aplodontia rufa            1128.  17.4            3.79  66.0 
##  5 Apodemus semotus             27    4.32           1.22   5.25
##  6 Arborimus pomo               27.1  5.81           1.29   7.55
##  7 Arvicanthus abysinnicus      82.2  6.24           1.77  11.0 
##  8 Berylomys bowersi           426.   9.11           2.49  22.7 
##  9 Castor canadensis         16060   32.7            7.81 256.  
## 10 Clethrionomys gappperi       24.1  5.06           1.01   5.09
## # ℹ 65 more rows

~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Regression Models

~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1. All Species

Running Regression Model for All Species

interspecies_lm <- lm(log(Mass) ~ log(LTRL), data = rodent_avg) #linear regression model 


s <- summary(interspecies_lm)
tidy(s, conf.int = TRUE) #TEST to take a look at confidence intervals
## # A tibble: 2 × 7
##   term        estimate std.error statistic  p.value conf.low conf.high
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
## 1 (Intercept)   -0.709    0.172      -4.12 9.81e- 5    -1.05    -0.366
## 2 log(LTRL)      2.79     0.0829     33.6  3.58e-46     2.62     2.95
tidy(interspecies_lm) #tables showing summary statistsics for this lm
## # A tibble: 2 × 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)   -0.709    0.172      -4.12 9.81e- 5
## 2 log(LTRL)      2.79     0.0829     33.6  3.58e-46
glance(interspecies_lm)
## # A tibble: 1 × 12
##   r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
##       <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1     0.939         0.939 0.426     1131. 3.58e-46     1  -41.5  88.9  95.9
## # ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

Plotting Mass as a function of Lower toothrow length and Linear Model

p_All <- ggplot(data = rodent_avg, aes(x = log(LTRL), y = log(Mass))) + 
  geom_point() + 
  geom_smooth(method = "lm", formula = y ~ x) +
  xlim(0,5) + #matching dimensions of plot to figure from publication
  ylim(0,12) +
  theme_classic(base_size = 16)

p_All

# 2. Species Less than 5kg Essentially is removing the four largest rodent species. This was done because for large rodents there may be different factors influencing dentition shape and size compared to smaller rodents. Including these larger taxa may result in the regression being misleading.

#removing four largest rodent species from data
rodent_no_outlier <- rodent_avg %>% 
  filter(!Species == "Erithizon dorsatum", !Species == "Castor canadensis", !Species == "Hydrochoeris hydrochoeris", !Species == "Cuniculus paca")

rodent_no_outlier
## # A tibble: 71 × 5
##    Species                     Mass  LTRL Toothrow_Width  RTRA
##    <chr>                      <dbl> <dbl>          <dbl> <dbl>
##  1 Aethomys hindei             98.4  6.29           1.84 11.6 
##  2 Akodon subfuscus            18.8  4.08           1     4.09
##  3 Ammospermophilus leucurus  110.   6.31           1.93 12.2 
##  4 Aplodontia rufa           1128.  17.4            3.79 66.0 
##  5 Apodemus semotus            27    4.32           1.22  5.25
##  6 Arborimus pomo              27.1  5.81           1.29  7.55
##  7 Arvicanthus abysinnicus     82.2  6.24           1.77 11.0 
##  8 Berylomys bowersi          426.   9.11           2.49 22.7 
##  9 Clethrionomys gappperi      24.1  5.06           1.01  5.09
## 10 Clethrionomys rutilus       27    5.4            1.09  5.89
## # ℹ 61 more rows

Running regression for Species separated by mass < 5kg

interspecies_small5kg_lm <- lm(log(Mass) ~ log(LTRL), data = rodent_no_outlier)

tidy(interspecies_small5kg_lm)
## # A tibble: 2 × 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)   -0.615     0.221     -2.79 6.86e- 3
## 2 log(LTRL)      2.73      0.113     24.1  2.58e-35
glance(interspecies_small5kg_lm)
## # A tibble: 1 × 12
##   r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
##       <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1     0.894         0.892 0.428      580. 2.58e-35     1  -39.4  84.8  91.6
## # ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

Plotting Mass as a funtion of LTRL for rodents with mass < 5kg

p_no_outlier <- ggplot(data = rodent_no_outlier, aes(x = log(LTRL), y = log(Mass))) + 
  geom_point() + 
  geom_smooth(method = "lm", formula = y ~ x) +
  xlim(0,4) +
  ylim(0,9) +
  theme_classic(base_size = 16)

p_no_outlier

# 3. Species less than 500g Filtering data to only all species with mass <500g

rodent_smol <- rodent_avg %>% 
  filter(Mass < 500)

rodent_smol 
## # A tibble: 61 × 5
##    Species                    Mass  LTRL Toothrow_Width  RTRA
##    <chr>                     <dbl> <dbl>          <dbl> <dbl>
##  1 Aethomys hindei            98.4  6.29           1.84 11.6 
##  2 Akodon subfuscus           18.8  4.08           1     4.09
##  3 Ammospermophilus leucurus 110.   6.31           1.93 12.2 
##  4 Apodemus semotus           27    4.32           1.22  5.25
##  5 Arborimus pomo             27.1  5.81           1.29  7.55
##  6 Arvicanthus abysinnicus    82.2  6.24           1.77 11.0 
##  7 Berylomys bowersi         426.   9.11           2.49 22.7 
##  8 Clethrionomys gappperi     24.1  5.06           1.01  5.09
##  9 Clethrionomys rutilus      27    5.4            1.09  5.89
## 10 Cryptomys hottentotus     132.   6.08           1.89 11.5 
## # ℹ 51 more rows

Running regression for Species separated by mass < 500g

interspecies_smol_lm <- lm(log(Mass) ~ log(LTRL), data = rodent_smol)

tidy(interspecies_smol_lm)
## # A tibble: 2 × 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)   -0.290     0.284     -1.02 3.11e- 1
## 2 log(LTRL)      2.53      0.157     16.1  2.54e-23
glance(interspecies_smol_lm)
## # A tibble: 1 × 12
##   r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
##       <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1     0.815         0.812 0.434      261. 2.54e-23     1  -34.6  75.2  81.5
## # ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

Plotting mass as a function of LTRL for species with a mass < 500g

p_rodent_smol <- ggplot(data = rodent_smol, aes(x = log(LTRL), y = log(Mass))) + 
  geom_point() + 
  geom_smooth(method = "lm", formula = y ~ x) +
  xlim(0,3) +
  ylim(0,7) +
  theme_classic(base_size = 16)

p_rodent_smol

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Subclade Grouping 1 - Rodents within Muroidea

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Filtering Data to Retain only families within Muroidea

Have to do so with original version of df because the subclade column was removed for previous code chunks

rodent_muriods <- rodent_with_RTRA %>% 
  filter(subclade == "Murinae" | subclade == "Neotominae" | subclade == "Arvicolinae" | subclade == "Deomyinae" | subclade == "Gerbillinae" | subclade == "Sigmodontinae" | subclade == "Spalacinae") #specifying subclades within Muroidea

rodent_muriods
## # A tibble: 225 × 8
##    Species        `MVZ specimen` `mass (g)` `LTRL (mm)` `M1 width (mm)` subclade
##    <chr>                   <dbl> <chr>            <dbl>           <dbl> <chr>   
##  1 Arborimus pomo          99621 24.5              5.85            1.31 Arvicol…
##  2 Arborimus pomo          99622 24.2              5.97            1.33 Arvicol…
##  3 Arborimus pomo          99623 22.3              5.56            1.14 Arvicol…
##  4 Arborimus pomo          99624 26.3              5.48            1.2  Arvicol…
##  5 Arborimus pomo         120595 38.3              6.2             1.49 Arvicol…
##  6 Clethrionomys…          30723 18.6              4.94            0.92 Arvicol…
##  7 Clethrionomys…          30724 20.6              5.06            1.09 Arvicol…
##  8 Clethrionomys…          30725 20                5.13            1.03 Arvicol…
##  9 Clethrionomys…          30726 38.6              5.22            1.04 Arvicol…
## 10 Clethrionomys…          30727 22.5              4.93            0.95 Arvicol…
## # ℹ 215 more rows
## # ℹ 2 more variables: `pub. avg. mass (g)` <chr>, RTRA <dbl>

Creating a new df of averages based off of species in Muroidea subclades

r_muriod_avg <-  rodent_muriods %>% 
  mutate(Mass = as.numeric(`mass (g)`), 
         Pub_avg_mass = as.numeric(`pub. avg. mass (g)`), 
         RTRA = as.numeric(`RTRA`)) %>% #change data for Mass, Published Average Mass, and Retangular Toothrow Area from type chr to dbl
  group_by(Species) %>% #grouping data by Species
  tidyr::fill(Pub_avg_mass, .direction ="down") %>% #filling in NA's in published averaged masses to make the following code happy
  mutate(Mass = coalesce(Mass,Pub_avg_mass)) %>% #Sub NA values for Mass with Previously published mass averages
  summarize(
    Mass = mean(`Mass`), #averages mass for each species
    LTRL = mean(`LTRL (mm)`), #averages toothrow length for each species
    `Toothrow_Width` = mean(`M1 width (mm)`), #averages toothrow width (based off of M1 width) for each species
    RTRA = mean(RTRA)
    ) %>% 
  mutate(across(where(is.numeric), ~ round(., 2))) # rounding data to only have two decimal places

r_muriod_avg
## # A tibble: 43 × 5
##    Species                    Mass  LTRL Toothrow_Width  RTRA
##    <chr>                     <dbl> <dbl>          <dbl> <dbl>
##  1 Aethomys hindei            98.4  6.29           1.84 11.6 
##  2 Akodon subfuscus           18.8  4.08           1     4.09
##  3 Apodemus semotus           27    4.32           1.22  5.25
##  4 Arborimus pomo             27.1  5.81           1.29  7.55
##  5 Arvicanthus abysinnicus    82.2  6.24           1.77 11.0 
##  6 Berylomys bowersi         426.   9.11           2.49 22.7 
##  7 Clethrionomys gappperi     24.1  5.06           1.01  5.09
##  8 Clethrionomys rutilus      27    5.4            1.09  5.89
##  9 Dicrostonyx groenlandicus  65.1  6.94           1.3   9.08
## 10 Graomys griseoflavus       54.4  5.36           1.47  7.9 
## # ℹ 33 more rows

4. Muroid Rodents with mass <5kg

Should be the same as the df above

rodent_lt5kg_muroid <- r_muriod_avg %>% 
  filter(Mass < 5000) #all murine rodents present in this study have a body mass of less than 5kg

rodent_lt5kg_muroid
## # A tibble: 43 × 5
##    Species                    Mass  LTRL Toothrow_Width  RTRA
##    <chr>                     <dbl> <dbl>          <dbl> <dbl>
##  1 Aethomys hindei            98.4  6.29           1.84 11.6 
##  2 Akodon subfuscus           18.8  4.08           1     4.09
##  3 Apodemus semotus           27    4.32           1.22  5.25
##  4 Arborimus pomo             27.1  5.81           1.29  7.55
##  5 Arvicanthus abysinnicus    82.2  6.24           1.77 11.0 
##  6 Berylomys bowersi         426.   9.11           2.49 22.7 
##  7 Clethrionomys gappperi     24.1  5.06           1.01  5.09
##  8 Clethrionomys rutilus      27    5.4            1.09  5.89
##  9 Dicrostonyx groenlandicus  65.1  6.94           1.3   9.08
## 10 Graomys griseoflavus       54.4  5.36           1.47  7.9 
## # ℹ 33 more rows

Running regression for Muroid rodents <5kg

muroid_lt5kg_lm <- lm(log(Mass) ~ log(LTRL), data = rodent_lt5kg_muroid)

tidy(muroid_lt5kg_lm)
## # A tibble: 2 × 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)   -0.679     0.356     -1.91 6.35e- 2
## 2 log(LTRL)      2.73      0.195     14.0  3.10e-17
glance(muroid_lt5kg_lm)
## # A tibble: 1 × 12
##   r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
##       <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1     0.827         0.823 0.476      197. 3.10e-17     1  -28.0  62.1  67.4
## # ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
p_muroid_lt5kg <- ggplot(data = rodent_lt5kg_muroid, aes(x = log(LTRL), y = log(Mass))) + 
  geom_point() + 
  geom_smooth(method = "lm", formula = y ~ x) +
  xlim(0,3) +
  ylim(0,8) +
  theme_classic(base_size = 16)

p_muroid_lt5kg

5. Muroid Rodents with mass <500g

Should be the same as the df above

rodent_lt500g_muroid <- r_muriod_avg %>% 
  filter(Mass < 500)

rodent_lt500g_muroid
## # A tibble: 40 × 5
##    Species                    Mass  LTRL Toothrow_Width  RTRA
##    <chr>                     <dbl> <dbl>          <dbl> <dbl>
##  1 Aethomys hindei            98.4  6.29           1.84 11.6 
##  2 Akodon subfuscus           18.8  4.08           1     4.09
##  3 Apodemus semotus           27    4.32           1.22  5.25
##  4 Arborimus pomo             27.1  5.81           1.29  7.55
##  5 Arvicanthus abysinnicus    82.2  6.24           1.77 11.0 
##  6 Berylomys bowersi         426.   9.11           2.49 22.7 
##  7 Clethrionomys gappperi     24.1  5.06           1.01  5.09
##  8 Clethrionomys rutilus      27    5.4            1.09  5.89
##  9 Dicrostonyx groenlandicus  65.1  6.94           1.3   9.08
## 10 Graomys griseoflavus       54.4  5.36           1.47  7.9 
## # ℹ 30 more rows

Running regression for Muroid rodents <500g

muroid_lt500g_lm <- lm(log(Mass) ~ log(LTRL), data = rodent_lt500g_muroid)

tidy(muroid_lt5kg_lm)
## # A tibble: 2 × 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)   -0.679     0.356     -1.91 6.35e- 2
## 2 log(LTRL)      2.73      0.195     14.0  3.10e-17
glance(muroid_lt5kg_lm)
## # A tibble: 1 × 12
##   r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
##       <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1     0.827         0.823 0.476      197. 3.10e-17     1  -28.0  62.1  67.4
## # ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
p_muroid_lt500g <- ggplot(data = rodent_lt500g_muroid, aes(x = log(LTRL), y = log(Mass))) + 
  geom_point() + 
  geom_smooth(method = "lm", formula = y ~ x) +
  xlim(0,3) +
  ylim(0,8) +
  theme_classic(base_size = 16)

p_muroid_lt500g

# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # Subclade Grouping 2 - Non-muroid Rodents # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Filtering Data to be only non-muroid species

rodent_non_muriods <- rodent_with_RTRA %>% 
  filter(!subclade == "Murinae" , !subclade == "Neotominae" , !subclade == "Arvicolinae" , !subclade == "Deomyinae" , !subclade == "Gerbillinae" , !subclade == "Sigmodontinae" , !subclade == "Spalacinae") #removing all subclades belonging to Muroidea

rodent_non_muriods
## # A tibble: 200 × 8
##    Species        `MVZ specimen` `mass (g)` `LTRL (mm)` `M1 width (mm)` subclade
##    <chr>                   <dbl> <chr>            <dbl>           <dbl> <chr>   
##  1 Aplodontia ru…         172402 1465              17.5            4.13 Aplodon…
##  2 Aplodontia ru…          22624 1375              18.2            3.83 Aplodon…
##  3 Aplodontia ru…          22626 1350              17.9            3.84 Aplodon…
##  4 Aplodontia ru…          22627 1048.2            17.5            3.61 Aplodon…
##  5 Aplodontia ru…         116278 1069              17.6            3.35 Aplodon…
##  6 Aplodontia ru…          96062 832               17.1            4.01 Aplodon…
##  7 Aplodontia ru…         114341 926               16.4            3.99 Aplodon…
##  8 Aplodontia ru…          96058 760.7             17.2            3.82 Aplodon…
##  9 Aplodontia ru…          96063 820.9             16.3            3.68 Aplodon…
## 10 Aplodontia ru…         115540 741               16.8            3.76 Aplodon…
## # ℹ 190 more rows
## # ℹ 2 more variables: `pub. avg. mass (g)` <chr>, RTRA <dbl>

Creating df of averages for non-muroid rodents

r_non_muriod_avg <-  rodent_non_muriods %>% 
  mutate(Mass = as.numeric(`mass (g)`), 
         Pub_avg_mass = as.numeric(`pub. avg. mass (g)`), 
         RTRA = as.numeric(`RTRA`)) %>% #change data for Mass, Published Average Mass, and Retangular Toothrow Area from type chr to dbl
  group_by(Species) %>% #grouping data by Species
  tidyr::fill(Pub_avg_mass, .direction ="down") %>% #filling in NA's in published averaged masses to make the following code happy
  #"The average body mass reported by Ernest (2003) was used for 3 species (Hydrochoerus hydrochaeris, Graphiurus murinus, and Cryptomys hottentotus) for which museum specimens lacked data on body mass." - Copied from Hopkins 2008
  mutate(Mass = coalesce(Mass,Pub_avg_mass)) %>% #Sub NA values for Mass with Previously published mass averages
  summarize(
    Mass = mean(`Mass`), #averages mass for each species
    LTRL = mean(`LTRL (mm)`), #averages toothrow length for each species
    `Toothrow_Width` = mean(`M1 width (mm)`), #averages toothrow width (based off of M1 width) for each species
    RTRA = mean(RTRA)
    ) %>% 
  mutate(across(where(is.numeric), ~ round(., 2))) # rounding data to only have two decimal places

r_non_muriod_avg
## # A tibble: 32 × 5
##    Species                      Mass  LTRL Toothrow_Width   RTRA
##    <chr>                       <dbl> <dbl>          <dbl>  <dbl>
##  1 Ammospermophilus leucurus   110.   6.31           1.93  12.2 
##  2 Aplodontia rufa            1128.  17.4            3.79  66.0 
##  3 Castor canadensis         16060   32.7            7.81 256.  
##  4 Cryptomys hottentotus       132.   6.08           1.89  11.5 
##  5 Ctenomys maulinus           208.  10.7            2.58  27.8 
##  6 Cuniculus paca             8860   30.8            7.05 218.  
##  7 Cynomys mexicanus           829.  13.7            4.37  59.8 
##  8 Dipodomys merriami           39.0  4.47           1.69   7.57
##  9 Erithizon dorsatum         6973.  30.5            6.32 193.  
## 10 Galea musteloides           341   12.6            2.51  31.7 
## # ℹ 22 more rows

6. Creating df of Non-muroid rodents of mass <5kg

rodent_lt5kg_non_muroid <- r_non_muriod_avg %>% 
  filter(Mass < 5000)

rodent_lt5kg_non_muroid
## # A tibble: 28 × 5
##    Species                     Mass  LTRL Toothrow_Width  RTRA
##    <chr>                      <dbl> <dbl>          <dbl> <dbl>
##  1 Ammospermophilus leucurus  110.   6.31           1.93 12.2 
##  2 Aplodontia rufa           1128.  17.4            3.79 66.0 
##  3 Cryptomys hottentotus      132.   6.08           1.89 11.5 
##  4 Ctenomys maulinus          208.  10.7            2.58 27.8 
##  5 Cynomys mexicanus          829.  13.7            4.37 59.8 
##  6 Dipodomys merriami          39.0  4.47           1.69  7.57
##  7 Galea musteloides          341   12.6            2.51 31.7 
##  8 Geomys arenarius           158.   7.07           2.07 14.7 
##  9 Glaucomys sabrinus         130.   8.09           2.09 16.9 
## 10 Graphiurus murinus          24    3.22           1.14  3.68
## # ℹ 18 more rows

Running regression for non-muroid rodents of mass < 5kg and plotting mass as a function of LTRL

non_muroid_lt5kg_lm <- lm(log(Mass) ~ log(LTRL), data = rodent_lt5kg_non_muroid)

tidy(non_muroid_lt5kg_lm)
## # A tibble: 2 × 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)   -0.299     0.262     -1.14 2.64e- 1
## 2 log(LTRL)      2.63      0.124     21.3  5.81e-18
glance(non_muroid_lt5kg_lm)
## # A tibble: 1 × 12
##   r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
##       <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1     0.946         0.944 0.328      452. 5.81e-18     1  -7.46  20.9  24.9
## # ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
p_non_muroid_lt5kg <- ggplot(data = rodent_lt5kg_non_muroid, aes(x = log(LTRL), y = log(Mass))) + 
  geom_point() + 
  geom_smooth(method = "lm", formula = y ~ x) +
  xlim(0,4) +
  ylim(0,9) +
  theme_classic(base_size = 16)

p_non_muroid_lt5kg

7. Creating df of Non-muroid rodents of mass <5kg

rodent_lt500g_non_muroid <- r_non_muriod_avg %>% 
  filter(Mass < 500)

rodent_lt500g_non_muroid
## # A tibble: 21 × 5
##    Species                    Mass  LTRL Toothrow_Width  RTRA
##    <chr>                     <dbl> <dbl>          <dbl> <dbl>
##  1 Ammospermophilus leucurus 110.   6.31           1.93 12.2 
##  2 Cryptomys hottentotus     132.   6.08           1.89 11.5 
##  3 Ctenomys maulinus         208.  10.7            2.58 27.8 
##  4 Dipodomys merriami         39.0  4.47           1.69  7.57
##  5 Galea musteloides         341   12.6            2.51 31.7 
##  6 Geomys arenarius          158.   7.07           2.07 14.7 
##  7 Glaucomys sabrinus        130.   8.09           2.09 16.9 
##  8 Graphiurus murinus         24    3.22           1.14  3.68
##  9 Liomys pictus              33.5  4.54           1.54  6.98
## 10 Mesomys hispidus          157.   7.8            1.87 14.6 
## # ℹ 11 more rows

Running regression for non-muroid rodents of mass < 500g and plotting mass as a function of LTRL

non_muroid_lt500g_lm <- lm(log(Mass) ~ log(LTRL), data = rodent_lt500g_non_muroid)

tidy(non_muroid_lt5kg_lm)
## # A tibble: 2 × 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)   -0.299     0.262     -1.14 2.64e- 1
## 2 log(LTRL)      2.63      0.124     21.3  5.81e-18
glance(non_muroid_lt5kg_lm)
## # A tibble: 1 × 12
##   r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
##       <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1     0.946         0.944 0.328      452. 5.81e-18     1  -7.46  20.9  24.9
## # ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
p_non_muroid_lt500g <- ggplot(data = rodent_lt500g_non_muroid, aes(x = log(LTRL), y = log(Mass))) + 
  geom_point() + 
  geom_smooth(method = "lm", formula = y ~ x) +
  xlim(0,4) +
  ylim(0,9) +
  theme_classic(base_size = 16)

p_non_muroid_lt500g

All Stats Summary table for LTRL Regressions

#Making a nice table of all Stats using sjStats package
tab_model(interspecies_lm, interspecies_small5kg_lm, interspecies_smol_lm, muroid_lt5kg_lm, non_muroid_lt5kg_lm, muroid_lt500g_lm,non_muroid_lt500g_lm, 
          show.intercept = TRUE, show.est = TRUE, show.ci = 0.95, show.se = TRUE, show.p = TRUE, show.r2 = TRUE, show.re.var = TRUE,
          dv.labels = c("All Species", "All <5kg", "All <500g", "Muroidea <5kg", "Non-Muroidea <5kg", "Muroidea <500g", "Non-Muroidea <500g"))
  All Species All <5kg All <500g Muroidea <5kg Non-Muroidea <5kg Muroidea <500g Non-Muroidea <500g
Predictors Estimates std. Error CI p Estimates std. Error CI p Estimates std. Error CI p Estimates std. Error CI p Estimates std. Error CI p Estimates std. Error CI p Estimates std. Error CI p
(Intercept) -0.71 0.17 -1.05 – -0.37 <0.001 -0.62 0.22 -1.06 – -0.17 0.007 -0.29 0.28 -0.86 – 0.28 0.311 -0.68 0.36 -1.40 – 0.04 0.064 -0.30 0.26 -0.84 – 0.24 0.264 -0.57 0.45 -1.47 – 0.34 0.212 0.19 0.28 -0.39 – 0.76 0.508
LTRL [log] 2.79 0.08 2.62 – 2.95 <0.001 2.73 0.11 2.51 – 2.96 <0.001 2.53 0.16 2.22 – 2.84 <0.001 2.73 0.19 2.34 – 3.13 <0.001 2.63 0.12 2.37 – 2.88 <0.001 2.66 0.25 2.15 – 3.18 <0.001 2.32 0.14 2.02 – 2.62 <0.001
Observations 75 71 61 43 28 40 21
R2 / R2 adjusted 0.939 / 0.939 0.894 / 0.892 0.815 / 0.812 0.827 / 0.823 0.946 / 0.944 0.743 / 0.736 0.932 / 0.928
Table 2 Hopkins 2008
Table 2 Hopkins 2008

Comparing my results with Table 2 from the paper, you can see that the values are quite similar for all datasets and statistical parameters. One thing I do want to note is the difference in the d.f for Non-Muroidea (both mass groupings). I checked the data multiple times and I think this must have been a small error in the publication because both the dataset provided in the publication and all of my data loaded into R has corresponded to the numbers provided in my analysis. This change appears to have minimal to no impact on my results but is still worth noting as it is a distinct difference.

Confidence Intervals Hopkins 2008
Confidence Intervals Hopkins 2008
Table 5 Hopkins 2008
Table 5 Hopkins 2008

The author of this paper did not end up providing confidence interval values but provided the equations (pictured above) and a table of the values for each variable across all analysis subgroups for both LTRL and RTRA (Table 5 - above). I attempted to determine how she calculated some of these variables but was unable to determine how she calculated the standard error of the regression (and how this is different from the standard error of the regression slope and intercept) and am honestly not sure how to proceed. I did calculate the confidence intervals but am unable to compare with Dr.Hopkins and determine how accurate my calculations are.

Showing all plots together

detach("package:sjPlot", unload=TRUE) # detaching sjPlot package to remove conflicts with plot_grid function below

bottom_square <- plot_grid(p_no_outlier, p_rodent_smol, p_non_muroid_lt5kg, p_muroid_lt5kg, 
                          labels = c("B. <5kg", "C. <500g", "D. non-muroids <5kg", "E. muroids <5kg"),
                          label_size = 16, 
                          hjust = 0, 
                          label_x = 0.15)


plot_grid(p_All, bottom_square, labels = c("A. all species"), label_size = 16, ncol = 1, hjust = 0, 
       label_x = 0.1, rel_heights = c(1, 1))

Figure 2 Hopkins 2008
Figure 2 Hopkins 2008

Comparing my analyses with Figure 2 from Hopkins 2008 you can see that my plots are quite similar if not exactly the same!! Across all groupings there is a significant and clear linear relationship between lower toothrow lenght and body mass. Woohoo for data replication being successful!!

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

RTRA Analyses

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

All Species - RTRA

interspecies_rtra_lm <- lm(log(Mass) ~ log(RTRA), data = rodent_avg)


tidy(interspecies_rtra_lm)
## # A tibble: 2 × 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)    0.923    0.0931      9.92 3.64e-15
## 2 log(RTRA)      1.50     0.0328     45.7  1.92e-55
glance(interspecies_rtra_lm)
## # A tibble: 1 × 12
##   r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
##       <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1     0.966         0.966 0.318     2086. 1.92e-55     1  -19.5  45.1  52.0
## # ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
p_All_RTRA <- ggplot(data = rodent_avg, aes(x = log(RTRA), y = log(Mass))) + 
  geom_point() + 
  geom_smooth(method = "lm", formula = y ~ x) +
  xlim(0,7) +
  ylim(0,12) +
  theme_classic(base_size = 16)

p_All_RTRA

8. Species Less than 5kg

interspecies_small5kg_rtra_lm <- lm(log(Mass) ~ log(RTRA), data = rodent_no_outlier)

tidy(interspecies_small5kg_rtra_lm)
## # A tibble: 2 × 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)     1.03    0.113       9.14 1.65e-13
## 2 log(RTRA)       1.45    0.0437     33.2  3.93e-44
glance(interspecies_small5kg_rtra_lm)
## # A tibble: 1 × 12
##   r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
##       <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1     0.941         0.940 0.319     1100. 3.93e-44     1  -18.5  43.1  49.8
## # ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
p_interspecies_small5kg_rtra <- ggplot(data = rodent_no_outlier, aes(x = log(RTRA), y = log(Mass))) + 
  geom_point() + 
  geom_smooth(method = "lm", formula = y ~ x) +
  xlim(0,5) +
  ylim(0,9) +
  theme_classic(base_size = 16)

p_interspecies_small5kg_rtra

9. Species Less than 500g

interspecies_lt500g_rtra_lm <- lm(log(Mass) ~ log(RTRA), data = rodent_smol)

tidy(interspecies_lt500g_rtra_lm)
## # A tibble: 2 × 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)     1.05    0.148       7.12 1.72e- 9
## 2 log(RTRA)       1.44    0.0649     22.1  2.81e-30
glance(interspecies_lt500g_rtra_lm)
## # A tibble: 1 × 12
##   r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
##       <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1     0.893         0.891 0.331      491. 2.81e-30     1  -18.1  42.2  48.5
## # ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
p_interspecies_lt500_rtra <- ggplot(data = rodent_smol, aes(x = log(RTRA), y = log(Mass))) + 
  geom_point() + 
  geom_smooth(method = "lm", formula = y ~ x) +
  xlim(0,4) +
  ylim(0,7) +
  theme_classic(base_size = 16)

p_interspecies_lt500_rtra

Summary Stats for all RTRA linear models

library(sjPlot)

tab_model(interspecies_rtra_lm, interspecies_small5kg_rtra_lm, interspecies_lt500g_rtra_lm, 
          show.intercept = TRUE, show.est = TRUE, show.ci = 0.95, show.se = TRUE, show.p = TRUE, show.r2 = TRUE, show.re.var = TRUE,
          dv.labels = c("All Species", "All <5kg", "All <500g"))
  All Species All <5kg All <500g
Predictors Estimates std. Error CI p Estimates std. Error CI p Estimates std. Error CI p
(Intercept) 0.92 0.09 0.74 – 1.11 <0.001 1.03 0.11 0.81 – 1.26 <0.001 1.05 0.15 0.76 – 1.35 <0.001
RTRA [log] 1.50 0.03 1.43 – 1.56 <0.001 1.45 0.04 1.36 – 1.54 <0.001 1.44 0.06 1.31 – 1.57 <0.001
Observations 75 71 61
R2 / R2 adjusted 0.966 / 0.966 0.941 / 0.940 0.893 / 0.891
detach("package:sjPlot", unload=TRUE)

Table 4 Hopkins 2008 Again comparing all statistical paramters you can see that between my replication of analyses and Table 4 (above) that my results returned very similar values and show that RTRA is in fact a significant predictor of Body Mass as well!